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' Abstract: A model of a quantum information source is proposed, based on the 

Gibbs ensemble of ideal (free) particles (bosons or fermions). We identify the 
Q\ • (thermodynamic) von Neumann entropy as the information rate and establish 

the classical Lempel-Ziv universal coding algorithm in Grassberger's form for 
such a source. This generalises the Schumacher theorem to the case of non-IID 
qubits. 



1. Introduction and basic facts 



43 

In classical information theory, the fundamental unit is a 'bit', and the model 
behind it is a random variable taking values and 1 with probability 1/2. We 
often refer to a sequence of random variables as a source - note that the physics 
of the way in which the random variables are generated is irrelevant, and that 
results on data-compression rely only on the statistics of long 'strings'. 

In the newer quantum information theory, the fundamental unit is a 'qubit', 
which is associated with a two-dimensioanal complex Hilbert space. Here the 
structure is much richer, since states can be not only |0) or |1) but any complex 
linear combination in between. However, the definition of a general quantum 
source producing a sequence of qubits remains open. 

So far, the theory of quantum data compression has confined itself to the case 
of qubits emitted by an IID (independent identically distributed) source. Here, 
a qubit is a general 2x2 density matrix a, and the assumption of independence 
is that the state of n qubits is described by the tensor power cr® n . IID qubits 
can be implemented as photon pulses emitted by a laser. However, this model 
does not allow natural entanglement, and hence lacks interesting physical prop- 
erties. Even the most enthusiastic proponents of modern quantum information 
theory consid er th e IID assumption as "an unfortunate restriction" (see Nielsen 



and Chuang [N-C|, p. 554). It was noted that attempts to reliably produce a 
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qubit string by using various "random processes, such as the preparation and 
detection of photon pairs ... or atoms in thermal beam... suffer from inescapable 
signal degradation, ... as the probability of randomly generating the appropriate 
conditions decreases exponentially" (Q, p. 256). 

On the other hand, in practice, qubits can be modelled by using physical 
particles or spins - electrons or atoms. Recent experimental results in quantum 
entanglement (see Sackett et al js| ) indicate that perhaps the most reliable way 
to prepare a string of quantum qubits is to couple quantum particles in a coherent 
way. In the experiment reported in jS|, these were ions of 9 Be + interacting, ap- 
proximately, via a Dicke-Lamb type potential a nd arra nged in a one-dimensional 
lattice. A similar approach was put forward in [ J-K-P| . Most recent experiments 
with physical implementation of Shor's quantum factorisation algorithm also 
use quantum particle systems as a material base of a computational device |Q] . 
For a mathematician, this stimulates interest in rigorous analysis of information 
coding methods for sources represented by ensembles of quantum particles. 

The first step in this direction would be to consider the eigenvector distribu- 
tion of a Gibbs density matrix of a large system of quantum particles or 'spins'. 
An eigenvector <f> of the density matrix can in principle be identified as a result 
of a quantum 'measurement' and the probability that in the grand canonical 
Gibbs ensemble the system chooses a pure eigenstate \<t>)((f)\ is proportional to 
exp (— (3[in — /3A). Here n is the number of particles in state |</>)(</>|, fj, repre- 
sents the chemical potential (and z — e^* the 'fugacity'), A is the corresponding 
eigenvalue and [3 = 1/kT where the T is the absolute temperature and k the 
Boltzmann constant. The idea of our approach is that the corresponding eigen- 
vector may usually be represented as a long sequence of numbers ('digits'). If the 
quantum ensemble carries 'enough randomness', such a sequence can be treated 
as a sample of a random process or field. It seems interesting to analyse such a 
process or field from the point of view of (classical) information theory. 

A natural (and simplest) example to consider is a system of free quantum 
particles in a volume A C M. d (an open bounded domain with piecewise smooth 
boundary OA). The interaction here is manifested through the chosen statistics 
(Bose or Fermi). The grand canonical Gibbs ensemble in A is described by a 
quasi-free bosonic or fermionic density matrix p± in the Fock Hilbert space T± 
associated with volume A (index ± indicates the Bose or Fermi statistics) . Such 
a state is generated by the one-particle Hamiltonian H (= H^), a self-adjoint 
operator in the one-particle complex Hilbert space H (= H A ), given values of 
the thermodynamical parameters /3 and /i. 

A typical model is where TL A = L2(A) and operator H is minus one-h alf of 
the Laplacian with a 'classical' boundary condition on OA, see for example | B-R | 
Sections 5.2.4 and 5.2.5. In this case we assume that a) (3 > and b) fj, > for 
bosons and — oo < /i < oo for fermions. A lattice version of such a model is 
where Tt A is the Hilbert space whose (complex) dimension equals # (A n Z d ), 
the number of points I = (li, . . . , Id) € Z with integer components lj within A. 
Here, H may be minus one-half of the discrete Laplacian, again with a 'classical' 
boundary condition on d [A D 1> d ) . 

Suppose that H A has a pure discrete spectrum and the eigenvalues of H A 
(counted with their multiplicities) are j A , with min ne j v '7„ = 0. Here n runs 
over a finite or denumerable set J\f (— M A ) and X)ne7V ex P ( — Pin) < 00 f° r au 
(3 > 0. For instance, if A C R d is a cube (~L/2,L/2) d and H = -1/2A with 
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periodic boundary conditions, AT coincides with the integer cubic lattice Z d and 
~f A = 4tt 2 H 2 /£ 2 where \n\ = (n? + . . . + n 2 ) 1/2 , n € Z d . 

An eigenvector <f> of the quasi-free density matrix p A is associated with a 
sequence of occupation numbers k — {k n ,n e A/"} (we will also write (f> — (f>u)- 
More precisely, k n is a non-negative integer equal to the number of particles in 
the eigenstate of H with the eigenvalue 7^; in the fermion case, k n — or 1. It 
is convenient to set K + = Z + := {0, 1,2,...} for the boson and K_ = {0, 1} for 
the fermion case. In both cases, the number of non-zero entries k n in a given 
k is finite, with the sum X^neW ^« representing the number of particles. The 
corresponding eigenvalue is 

A(=A£)=exp(-/3^M/i + 7£)J. W 

Thus the probability that the system will be found in pure state is 
proportional to 

exp ( -13 + 7™) ) = II CX P + ^) fc ") ' ( 2 ) 

In other words, a free quantum ensemble produces an 'array' K = \K ni n £ A/"} 
of random variables K n with probability determined by Equation (0) . Through- 
out we use the convention that upper case letters refer to random variables, and 
lower case letters to the values that they take. This product form means that ran- 
dom variables K n , n £ A/", are independent (but not identically distributed). The 
marginal distribution of K n is geometric for bosons and two-point for fermions. 
Let V (= V±) denote the induced probability distribution on JC± = K^, sup- 
ported by the set JC± of arrays with finitely many non-zero components); it is 
convenient to think that V is determined by the quadruple (H A , H A , (3, yu) . 

Now assume that {A} is an increasing sequence of volumes in M d eventually 
covering the whole of R d , writing A f M. d . In is convenient to think that A is 
the result of the homothetic dilation of a fixed open bounded domain A C K d 
containing the origin and with a piece-wise smooth boundary dA° consisting of 
finitely many smooth parts. In the above example, we can think of A as a unit 
cube (-1/2, l/2) d and A = {-L/2,L/2) d as its dilation by the linear factor L. 

A question arises then: what are the properties of the 'source' (IC,V A )7 To 
what extent can classical coding theory be applied to such a source (or rather a 
sequence of sources, as A /* M. d )7 Some classical results are easily extended to 
the the case of (IC,V A ) (after all, V A is a product-distribution, albeit not sta- 
tionary). For example, an asymptotic equipartition property (AEP) for (K.,V A ) 
is fairly straightforward (see Proposition . [This property can be considered as 
an analog of the famous Shannon-McMillan-Breiman Theorem in the situation 
under consideration.] The corresponding information rate coincides with the von 
Neumann entropy per unit 'volume' of the limiting quantum free ensemble. 

However, results beyond the AEP, such as the classical Lempel-Ziv universal 
encoding algorithm, are more tricky to establish. The Lempel-Ziv algorithm, 
in its various forms, is perhaps the most popular encoding method in modern 
information transmission. The idea of the algorithm, in the form of 'parsing' 
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originally proposed by Ziv and Lempel [Z-L is very simple. Suppose we have a 
sequence xq, X\, x 2 , ■ ■ • of 'letters' from an 'alphabet' (say, Xi £ {0, 1} (the binary 
alphabet)). We put a marker sign (say, a semi-colon ;) after xq. If X\ ^ Xq, we 
put the marker sign after x%, otherwise (i.e., if xq = xi), we put it after x 2 . 
Continuing this procedure, given that the last marker sign was after Xj, we put 
the next marker sign after letter Xji, j' > j, if for all s = 1, . . . , j' — j — 1, the 
'word' (xj+i,... ,Xj+ s ) is among the 'blocks' formed between the subsequent 
marker signs already in place, but the 'word' {xj+i, ... ,Xj>) has not been seen 
before. 

This gives rise to the following encoding method: each new parsed word has 
a 'header' (the word less the last letter) which has been seen before. Thus, to 
'encode' this bit of sequence Xq, x\, x 2 , ■ ■ ■ we need only to indicate the place 
where the header was seen in the past and in addition encode the last letter of 
the new block. 

The popularity of this algorithm is due to its universal character (no knowl- 
edge of the properties of the source is required to implement it), and to the fact 
that asymptotically it achieves the data compression limit. However, this con- 
vergence is slow, leading to adaptions of the algorithm, including the so-called 
Grassberger ]g| form of the algorithm which also suits the multi-dimensional 
situation (d > 1). 

In Sections || and |^ we state our main results (see Theorem [l]), that the 
Lcmpel-Ziv algorithm is valid (again with the von Neumann entropy as the 
information rate). In the higher-dimensional case, we establish this result in 
Grassberger's form, and in the one-dimensional case, we also prove it in terms 
of the classical Lempel-Ziv algorithm. The proofs are given in Sections [| - |[ 

Our assumption on quadruple (Ti., H, /1, (3) follow the basic model outlined 
above where A = (-L/2, L/2) d , TL A = L 2 (A) and H A = -1/2A with periodic 
boundary conditions. [The lattice version of this model can also be easily incor- 
porated]. We consider fixed /3 > and /.i > for bosons and — 00 < /.i < 00 
for fermions. Although this formally excludes the Bose-Einstein condensation, 
the fact is that the condensation is largely irrelevant to our results. We intend 
to discuss this issue in a separate paper. Furthermore, many of the properties 
obtained in this paper can be in turn extended to systems with interaction. The 
corresponding results are now in preparation. 

We would like to point out an esse ntial non -uniqueness of the definition of the 
quantum entropy (or entropies), see [C-N-T|. From this point of view, it would 
be interesting to clarify the relation of various concepts of quantum entropy with 
quantum information theory. 



2. Preliminary results 

Our main assumption is that a) the set M coincides with Z d , the cubic lattice, 
and so the collection of 'arrays' )C± C consists of functions on 7L d with finite 
supports and with values in K + = Z + for bosons and K_ = {0, 1} for fermions, 
b) the eigenvalues 7^, n £ Z d , of the one-particle Hamiltonian H A are of the 
form 6>(||n||/L), where c) L = L(A) is a parameter increasing to 00 as sequence of 
volumes A /* 1 d (it will be covenient to assume that L simply runs over the set 
of natural numbers), and d) 6: [0, 00) — * [0, 00) is a given continuous function, 
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such that 6(x) > for x > 0, and such that the following integral is finite: 

I (lTe-« 9 ^) + /?(«)+ M) ( lTe - M)+ ,) )) dy, 

(3) 

where we take the first choice of all the =p for bosons, for all 0, p > 0, and the 
second choice for for fermions, for all /3 > 0, — oo < /i < oo. 

Parameter L can be though of as a 'linear size' of A and henceforth is used 
instead of A. In other words, we fix a sequence of positive numbers L — > oo 
replacing /l /* R d , say L = 1,2,.... As was suggested, it is convenient to think 
that A is the cube (~L/2,L/2) d . 

In the model where H A = —A/2 with periodic boundary conditions, Q(t) — 

ilT 2 t 2 . 

Definition 1. Integrals are called the von Neumann entropy per unit vol- 
ume in the free boson/fermion limiting Gibbs ensemble and denoted by h±. The 
restriction of the integral to a domain T C M. d is denoted by h± . 



Remark 1. The reasoning behind this definition is as follows. The probability 
measure V L has been specified by Equation ([!]) as the product x ne z d7T n where 
n n is the geometric distribution with parameter e~P(~<n+tJ.) for bosons and the 
two-point distribution, with 7T n ({0}) = 1+e _/ (7 „ +M) , 7T„({1}) = 1 e +e -p(- ln + ti) , 

for fermions. The entropy of V L divided by L d (the volume of A) is simply a 
Riemann sum for the integral h± and converges to h± as L — ■> oo. On the other 
hand, the entropy of V L is equal to the von Neumann entropy tr^tp^^log p L 

of the density matrix p L corresponding to the Gibbs ensemble of free quantum 
particles in A, for given (3 and /i. 

It is easy to check the following law of large numbers. 

Proposition 1. Consider the random variable k — » (l/L d )log X L (k), k G 
/Cj_, where A L (fc) is i/ie eigenvalue of Gibbs ensemble density matrix p± deter- 
mined by function k: Z —> K±. Then, for all e > 0, lim 7^ (|£ L — > e) = 

L — >oo 

0. Also, lim ^ L {K L ) = h± almost surely (a.s.) with respect to the product mea- 

L — >oo 

sure V xL on the Cartesian power JC xL , with the sequence (K L ) of V -random 
elements. 

A straightforward consequence of Proposition [I] is 

Corollary 1. List the eigenvalues X L (k), k € /C^., in decreasing order: A( ) > 
A(i) > . . . . Given e G (0, 1), select the eigenvalues in their order until the sum of 
the selected A 's becomes greater than or equal to the value 1 — e for the first time. 

Let M± denote the number of selected eigenvalues. Then lim — -i log M± = h± . 

L—too L 

Definition [l], Remark |l|, Proposition [j] and Corollary ^] specify an asymptotic 
equipartition property for probability measures P L , and h± can be considered 
as an analog of the information rate for (KP 1 T L ). 
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3. Main result 

For the rest of the paper, k G K.± is a function 7L d — > K± with compact support; 
we identify it with the collection of values k n , n G 7L d . Given a probability 
measure V L on /C± = , K stands for an array of random variables {K n } 
representing the random element of /C± . When considering the product-measure 
p xL on the Cartesian product /C xL , we denote by K L the T^-random element 
of K. 

Definition 2. Given k G AT, it = (ui, . . . , u^) G Z d and s > 1, define the cubic 
box B u (s) to have bottom corner u and side s: 

B u {s) = {n = (rii, . . . , rid) G Z d : it, < < itj + s — 1, /or aZH = 1, . . . c?}. 

Write k u (s) = {k n : n G B u (s)} for the set of values of k confined to this box. 

Now define r^(fe) to be the size of the smallest box with bottom corner at 
position u with values different to all the others with bottom corner in Bi(L), 
1 = (1,... ,l)GZ d : 

r£(fc) = inf{s > 1 : k u (s) / k v (s) for all v ^ u G B X (L)}. 

For K a random array, we define the random variable R^(K) in the same way. 

These h ave b een studied by authors such as G rassberger O, Kontoyiannis 
and Suhov §t]|, Quas @ and Shields |h|, f|h| first in the one-dimensiona 
case and later for higher dimensions, partly because they serve as good entropy 
estimators for an ergodic process with a suitable degree of mixing. For example, 
Theorem 1 of O shows: 

If the array K is generated by a 1 d -invariant ergodic probability measure on 
K,± with entropy h, under a Doeblin condition, 

^ RL{K) 1 ^ logL 

lim > —£r — - = 7, lim > -, T , — - = h, a.s. 
l^oo ^ L d \ogL h l^oo ^ L d RL(K) 

Our Theorem [l] below shows how a similar result looks for sequences (/C± L , P± L ): 
Theorem 1. For all fixed ( > 0, on IC± L , 



i im v logL d = hi^\n 



Here h^. ^ is the 'truncated' von Neumann entropy (cf Definition where 
B (C) * s the cube [0,C] n - 

We can deal with the case of £ increasing with L, under extra assumptions on 
the behaviour of 9. 

Assumption 2 For all r\, there exist C, d such that uniformly in x > r\ for 
y < 6: 6{x + y)/6{x) <C. 

Assumption 3 Our £ — ^ oo, slowly enough that Q/\ogL — > 0. 
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Theorem 4. If 9 satisfies Assumption || and Q satisfies Assumption ^ then 



on 



IC 



± > 



logi 



a.s. 



Remark 2. Alternatively, in the spirit of previous analysis, we can average the 
themselves. However, Theorem |l| in our view gives a more useful result for 
von Neumann entropy estimation. 

For the sake of clarity, we focus on the case £ = 1 (though we indicate in due 
course how the case of ( varying with L can naturally be dealt with) and first 
prove the one-dimensional (d = 1) version of the result for geometric variables 
(bosons), in Sections | and g. In Section^, we indicate the adaptations needed 
in the case of two- valued variables (fermions), and in Section [7, we show how 
the method adapts to the case of higher dimensions. We split the proof of the 
result into 3 parts, corresponding to the Lemmas 6, 7 and 8 used in Q. In each 
case, writing for the entropy of X u under V , we will show that \ogL/(R^) d 
is close to E^. 



Lemma 1. For any e > 0, then V -a.s.: 



R L U {K L ) < 



\ogL{\-e) 
E L 



i/<r 



Lemma 2. For any e > 0, then V -a.s.: 



Lemma 3. There exists a constant c — c(9) such that V 

( Rt(K L ) \ ^ 

hmsup max — tttti < c - 

L -,J UeBi(R) (logL) 1 / rf J ~ 



xL 



4. Proof of lower bound 

Recall, in the next two sections, we concentrate on the one-dimensional geomet- 
ric case and consider £ = 1. So, an array k 6 IC is now a 'string' {fc^, i € Z}, 
where ki is a non- negative integer. Write ki(s) for a finite piece (&z, . . . ki+ s — i) 
of string k of length s starting at position i where i, s G Z, s > 1. Then rf(k) 
is the length of the shortest piece starting at position i with values different 
to all the others pieces starting in {1,... ,L}: rf{k) = inf{s > : fej(s) ^ 
kj(s) for all j : ^ i in {1, ... , L} }. Accordingly, rf(k) is often called the match 
length. Here 9 is a continuous function: [0, oo) — > [0, oo) and Ef 1 is the en- 
tropy of the geometric distribution with parameter e - ^^' 1 ' 7 /L )+' i ) . Write 9* for 
sup xe[0A] 9(x). 
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We use the idea of a 'typical set', familiar from Ergodic and Information 
Theory. The aim is to show that usually we belong in this typical set S, which 
provides extra conditions so that the match length Rf cannot be too low too 
often. 

Definition 3. For a string k = {ki, i£Z}, we define the centred log-likelihood: 
yf{k) = - log^X, = ki) = p(n + 6(i/L))(ki - E L K^), 

and for the random string K, we define for the random variable y^(K). 
Here and below, M L stands for the expectation relative to V L . Define the typical 
set by 

j+M-l 

E y^ k ) <M6{j/L)e' 



&j,M — { k : 



where e' = f3e(- log (1 - e~ r ) /29* . 

Proof of Lemma ^. Now for any sequence M(i) and any rj > 0, we deal with the 
first r\L variables separately: 

fc:i#{i: J Rf(fc)<i55^}>3, 
C \k : # \i > r,L : R,f(k) < l ° gL ^ "\ k G £% MW } > r,L 

\j{k:#{i>nL:ktSf tM ^>nL} 

We bound the size of the first set in Lemma || and the size of the second in 
Lemma ^[ □ 

Lemma 4. Given r\,e > 0, we can find a sequence M(i) and constant Ci(rj, e) 
such that for any L > G\ and for any k G JC° 



Proof. We can find intervals Ji in which our variables have their means close 
together. Note that f(x) — l/(e@' Jl+x ' — 1) has derivative bounded below on 
x > e > 0. Hence, given 6 and e, we can calculate N = N(e) and u\, . . . Un with 
U\ = r), un = 1 such that 

! < 7 + e', iovi=l,...N-l, 



e P{p,+9{ui)) _ 1 - e /3(/i+0(« <+1 ) _ 1 

where e' is from Definition]^. Defining Ji = {m : m/L G (ui,Ui + i)}, for each 
j G Ji define M(j) = logL(l - e)/u i+1 . 

We compare S^ M with D^m, & set which we can count and control more 
easily. For each 7, M , define E 1 for the entropy of a geometric distribution with 
parameter e -7 and 

D 7>M = |xi(M) = {xi,... ,x M ) : Yl Xi - M + ^r)} ' 
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For Xi(M) £ -D 7 .a/, writing P 7 for product measure for independent geometric 
random variables with parameter e~ 7 : 



P 7 (x x (M)) = exp ^Mlog(l - e~ 7 ) -7^2;^ > exp(-M^ 7 (l 
If k E S^ M , where j £ Jj, taking 7 = sup^g^ (3(fi + 6(x)): 



j+M-l 



1 A . , / 1 eE, 



V e p{n+e(o/L) _ 1 J- l e 7_i 



so kj(M) — (kj, . . . kj+M-i) S -D 7 ,m- We therefore know that if k £ S L ■ 



and = rf(k) < log£(l - e)/^ < M(j) then 



. < M(o\ the n 



■ • • , fe,- +r i_]J > P 7 (fcj, . . . , % +M (i)-i) > exp(-M(j)£" 7 (l+e)) = L 



ji 7 I fVj , 

Since these finite strings are distinct, the number of strings in Ji such that 
these two conditions hold is less than L . Summing over intervals Ji, the 
total number of such strings is less than L(L~ e N). Hence if L > 61(77, e) = 
(iv^e)/^) 1 / 6 then L~ e N < 77 and the assertion holds. 

If £/ logi — > 0, then since AT grows linearly with £, we know that L~ e N still 
tends to zero as required. □ 

Note that the precise definition of match length rf(k) doesn't matter, and 
that this analysis will go through for a variety of related definitions. For exam- 
ple, the original Lempel-Ziv parsing algorithm (see Introduction), or one-sided 
definitions of match lengths can be analysed in the same way. The key obser- 
vation is that ki (rf ) are distinct strings. We deal with these issues in Section 

I 

Next, we show that most of the time, we are in the typical set S^ M uy using 
a series of applications of Chebyshev's inequality. 

Lemma 5. Suppose £ is fixed, or that 9 satisfies Assumption [| and £ satisfies 
Assumption |^. For any r\ > 0, V xL -a.s.: 

1 Ci 

- £ i(k L i s£ M(i) ) > v , 

i = T)L + l 

for only finitely many values of L. 
Proof. We require 




<maxi/f->0, (4) 



where := V L (K L Sf M us). The key is a uniform bound on the 4th moment 



,M( 

E L (F/) 4 of F/ = 9(j/L)(K l - E L Ki). Note that for X a geometric variable 
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with parameter q: E(X - EX) 4 = q(q 2 + 7q+ 1)/(1 - qf < 9/(1 - <?) 4 , so that 
writing 9 for ini x>v 9(x): 



e (y^j < 9 



_ e -p(ji+6(j/L)) J - \ j _ e -/3( A1+ e) y ^ u/ " 



1 -e 

Hence for any set S 1 



ues / jes j,kes,jjtk 



<3Cmax( M + fl(j/L)) 4 (#5) 2 . 



By Chebyshev, for any i, 



(A/(i)#(i/L)e') 4 - e ' 4 ie(<,i+M(<)-i) V / M(i) 2 ' 



4 



4 



which is less than K/M(i) 2 , under Assumption ||. So max^ z/f < if/(min; M (i)) 2 , 
and since min^ M{i) = logL(l — e)/£, if (/\ogL — > 0, then Equation (||) holds. 
Now Z 2 L = I(K L SiM{i)) ~ v i ^ s a var i a ble with mean 0, variance < i/j 



and and are independent, if |i — j\ > M(i). Hence Var ^Efc^L+i 

Ef^ L+1 ^M(i) < KCL/(mi ni M(i)) = KC 2 L/((1 - e)logL). 
Overall, then, we deduce that for large enough L: 

T(TSL rf QL \ > < V ! / < 



< 



4# 



which is summablc in L. □ 
The proof of Lemma [l] is now complete. 



5. Proof of upper bounds 

We establish the upper bound in Lemma ||, by proving a related result about 
return times. 

Definition 4. The return time T n j(k) is how long you have to wait until the 
substring fej(n) is repeated in k: T n j(k) = infjj > 1 : ki + j(n) — fej(n)}, and 
T**J(k) is a time-reversed version: T™J(k) — inf{j > 1 : fej_j(n) = fej(n)}. 



Theorem 1 of [ 0-W | shows that 

For a stationary ergodic probability measure on K, with entropy h, for any i: 

hm ■ = h a.s. 
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We need a version of this result for distributions V L . In the limit we are close 
to the IID case, so we lo se little in comparison with that case. 



Wyner and Ziv | W-Z were the first to exploit the dual relationship between 
waiti ng times T n ^ and match lengths R ^. W e shall follow Shields' appr oach , 
phlj , modified subsequently in Q and ph2| (to remove a confusion in phl| 
in the way in which return times are defined - whether 'overlapping matches', 
when T U} i < n, are counted). 

A useful element introduced in Q is a truncation a rgum ent needed to cover 



the case of geometric random variables (the analysis in [ 3hl only holds for finite 
alphabet processes). Q introduces a truncation operation r m where T m {x) = 
min(x, to) and T m (x) = (T m (xi), i£Z). Denote the match lengths and entropies 

of the truncated process by rf(k), it^ (K) and E i . First note that rf(k) < 

r\ (r m (fc)) = rf (k). Secondly, since E t /E[ = 1 — exp(-f3(^ + 9(i/L)m), we can 

ensure that Ef < E i (1 — e/2)(l + e) for all i. Hence, we need only prove that: 

Lemma 6. For fixed 9, rj, e, then for each string k defining: 

U L {k) = \ i > V L : rf(k) > 1 + _ L lQgL I 



Ei (1 - e/2) 
then limsup L ^ CCl #[/ i (^)/L < j], V /L -a.s. 



Proo f. We mirror the duality argument (cf Lemma 3 of [ 3hl | and Appendix of 
phi). Define for N = 1, 2, . . . the forward count: 

F&(fc) = \i : l0gr "' t(fc) < E\ (1 - e/2) for some n > N 



and backwards count: 



logT rcv (A;) r 

B%{k) = li: "■' < Ei(l - e/2) for some n > N 



Now, if i e U L (k), then there exists j ^ i such that fej(s) = kj(s), where 
s = rf (fe) — 1, so either 

1. If i < i, then r„,;(fc) < L, so that logT„,i(fe)/n < log L/n < 1% (1 - e/2) 

2. If j < i, then Z^(fc) < L, so that \ogT^{k)/n < logL/n < E% (1 - e/2) 

Hence if i e U L (k), then i is in Fjj(k) or Bjy(fe) for some AT. So, using the 
finiteness of the alphabet, if we can show that Ffc(K) and B^(if) are of low 
density (that is \Ffc(K)\/L and \B%(K)\/L are < 77 eventually, V xL -&.s.) then 
so must U L (K) be, and the result follows. 

First, we show that the number of overlapping matches is small. We mirror 
and define A = {k : fei(s) = fc s+ i(s) for infinitely many s > 1} . We will 
show that this set has measure 0, by defining B m — {k : k\(m) = k m +i(m)} , 
so that A = P); U m >z Following Jq|, for each to and to € Z™, write VV(tu) 
for the set of strings which begin with word (i.e., have ki(m) = w), W(ww) 
for the strings which begin with w repeated twice. Now for 6 > consider a 
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^-representative set V = {w : — logV L (w) > Y^jLi — O n set Vg, since 
the entropy is bounded below: 

v L (B m nv s ) = J2' pL (W(i»to)) 

< (^7' L (W( U ;))]exp 





= exp 

which is summablc in m. So a Borel-Cantclli argument establishes the result. 

Next, to bound Fjf(K), we consider a word x = (x±, . . . , x n ) which lies in a 
(^-representative set of the fej(n)'s, that is for some 8 > 0: 

n 

^2\x l -E L K i _ 1+l \<6/(3. (5) 

By direct calculation, we can bound from above the probability that x turns up 
later, that is for j > i: 

V L (Kj(n) = x) < V L (Ki(n) = x)exp(9*8). 

Hence for any integer f, if Equation (||) holds: 

i+t 

V L {n+l <T nti (K) <t\K t (n) = x) = £ V L (K m (n) = x \Ki(n) = x) 



m— i+n+1 



< tV L (Ki{n) = x)exp(6*8) 

< texp{-nEf + 26*8). 

Then with t = cxp(n(E[ J — e)), we need to pick 8 growing slowly enough that 
9*8/n tends to zero - say 8 = n 7 / 6 (if (l is growing more slowly than log!/, 
we can still choose appropriate 8). Consider overlapping and non-overlapping 
matches separately: 



pL / io g r, M (i^) < El 



Ki(n) = x 1 <V L 




for n sufficiently large. As n — > oo, the probability that Equation (H) holds tends 
to 1. We can bound the backward set B^(K) similarly. □ 

We can now prove the uniform upper bound in a more straightforward fashion 

Proof of Lemma [3. Since for any j, maxi V L (Kj — i) = V L (Kj = 0) = 1 — 
exp(-/3(> + 9{j/L))) < 1 - exp(-/5(/i + 9*)), then for any N: 

V L (Rf(K) >N)=V L (Kj(N) = Ki(N), for some j 6 {1, . . . ,L},j ^ i) 

L 

< P L (K j (N)=K i (N))<L(l-e X p(-f3(fi + 6*))) N . 
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So V L (max i Rf' > N) < L 2 (l - exp(-£(/z + 6*))) N . Taking c > -3/log(l - 
exp(— + 0*))), and N — clogL, the result follows. 



Again, if C^/logL — > 0, the same bounds will work: since we need to make 
more comparisons, replace L 2 by (L(l) 2 , and the logarithmic term is dominated 
by the polynomial. □ 



6. Fermions 

We can use the same techniques to consider the alternative model of two-point 
random variables (still in one dimension). We make the following observations, 
which ensure that the above proofs will carry through. 

1. We adapt the proof of Lemma 0, introducing, for < p < 1: 



Here E p stands for the entropy — plog p — (1 — p)log(l — p). Again, it is 
true that for x x {M) € D pM , F p (u) > exp(-ME p (l + e)), where P p is the 
Bernoulli measure on /C_ = K?., with P p (K t = 0) = 1 - p, P p (X l = 1) = p. 
The assertion of Lemma |] then follows in the same way as before. 
2. For a string k — (k i: i G Z) S /C, we define 



y\ (ki) = - logV L (K t = h) - El = (h - E L Ki) log (l/V^K, = 1) - l) , 



and for the random string K, define Y^iKi) in the same fashion. 
3. For random variable K taking values with probability 1 — p and 1 with 
probability p, if Y(k) = - logP(if = k)-E p then 



Since for p G [0, 1]: 1 — 4p + 6p 2 — 3p 3 < 1, and making the substitution 
y = log(l/p - 1), for p < 1/2 implies plog(l/p - l) 4 = y 4 /(l + e y ) < 24. By 
symmetry, the same result holds for p > 1/2. Hence the proof of Lemma g| 
goes through. 

4. Since we now deal with finite alphabets only, the proof of Lemma ^| simpli- 
fies. We don't need the truncation argument previously described, and our 
observations about representative sets will go through as before. 

5. The upper bound in Lemma |^ is proved in the same way, since a uniform 
bound maxjj P(X^ = i) < max(l/(l + e -^+ e "» , 1/(1 + e^)) holds. 

7. Adaptions to the higher-dimensional case 

As in Jq|, the generalization to higher dimensions goes through in a rather 
straightforward fashion. 




E(y(/c)) 4 =p(l -Ap + Qp 2 -3p 3 )log(l/p- l) 4 . 
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1 . The proof of Lemma [| carries through; we still divide the larger region into 
sets Jt = {u = (ui, . . .Ud) € Z d : ||ti/X|| G (ui,Ui+i) on which the variables 
are nearly IID. In general we need to replace M by M d , so for example: 

D lM =\xi{M): ^< Md 

{ i£B 1 (M) 

We introduce M(i) = (dlog£(l - e)/E Ui+1 ) 1 / d . 

2. The proof of Lemma o goes through as before, since the uniform bound on 
the 4th moment of Yj still holds. 

3. We can extend the definition of waiting time required in Section Writing 
v = (vi, . . . , Vd) > to mean that vi > 0, 1 < I < d, and with \v\ + = maxzjj: 

Tn !U (k) = ini{\v\ + : v > 0,k u+v (n) = k u (n)}. 

4. The upper bound in Lemma |^ is proved in the same way, since a uniform 
bound on max [P L (K U =j),ue Z d ,j > 1] holds. 




8. Lempel-Ziv parsing 

Now we establish the Lempel-Ziv parsing algorithm for onc-dimcnsional free 
quantum systems. We use the notation from Sections ^- [f| Recall the algorithm 
takes a string (or a 'message') ki{L) and parses it into words; at each stage, we 
add a marker, so that the parsed block is the shortest word not already seen. 



Definition 5 (Lempel-Ziv parsing). We parse the string k\(L) = (hi,..., 
fci) into words: 

fci(L) = {fc t(1) (/(l));fc t(2) (/(2));...;fc t(c) (i(c));fc t(c)+ i(r)}, 

according to the rule: t(l) = 1, t{i + 1) = t(i) + 

l(i + 1) = min{m > 1 : k t(l) (m) £ {fe t(1) (1(1), . . . ,k t{l) (l(i))}} } 

where fc t ( c ) +1 (r) is the remaining word, r = L — t(c) — 1 and c+ 1 (= c(k, L) + 1 ) 
the total number of parsed words. 

As was noted in the Introduction, this parsing rule is associated with a 
data-compression algorithm which is asymptotically efficient (achieves the upper 
bound provided by entropy) for ergodic processes. The algorithm relies on the 
fact that for each word k t u\(l(i)), we can describe it by first giving the point 
in the string between 1 and t(i) < L where block k t ^(l(i) — 1) previously oc- 
curs, and then by giving the extra symbol which is different. Thus we require 
logL + 1 symbols to specify each parsed word in k\(L) and t he t otal length of 
the compressed message will be: c(k, L)(\ogL + 1), cf Shields [3h3|, Chapter 11. 
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Theorem 5. For the one- dimensional quantum free ensemble, for all £ > 0, 

Urn c( ^ l(C£))1 ° gL = (6) 
Under Assumptions and 

r c(gi(C£))log£ , „ xL 

nm = n±, F± -a.s. 

Proof. We know that the RHS is lim^^oo Yli=i I L which represents the data 
compression limit. Th at is , Shannon's Noiseless Coding Theorem (see for exam- 



ple Theorem 5.3.1 of [ C-T ]) states that the expected length of any decipherable 
code for a random variable X is greater than or equal to the entropy of X. 

Therefore, to prove Equation it remains to establish the upper bound 
limsup^^^ c{Ki{CL)) log L/L < h^- '^. We prove this using analysis similar 
to that of Section As before, the proof goes in the same way for all values 
of C, so we fix C = 1- Once again, we split the interval [0,1] into subintervals 
Ji = (v,i,Ui+i), and for each i, write kj i for (fc.L Ui , . . . , fci Ui+1 _i) and set Gi = 
{t(j) : Lui < t(j) < Lui + i} (the start-points of words which lie within the 

sub-interval). We also put N, = {r 6 G t : ££tj: (r)_1 E j < ^gL(l - e)}. 

In the spirit of Lemma ^, we first observe that the cardinality |iVj| < L 1_e , 
since again, these parsed words are short distinct strings, in the typical set. 
Then, considering the entropy present in these parsed words, we deduce that: 

(r-H(r)-l \ 
]T E l lA >logL(l-e)(|G l |-iV l ). 
s=r J 

On rearranging we deduce that 

.. |G«|logL E^j% , 
hmsup < — - — h e. 

L^oa L Lt 

The theorem follows by summing over intervals Ji, since £ \Gi\ = c(k, L). 
We can deal with the case of (/ logi — > as before. □ 
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